Top-k Distance-Based Outlier Detection on Uncertain Data
نویسندگان
چکیده
This paper studies the problem of top-k distance-based outlier detection on uncertain data. In this work, an uncertain object is modelled by a Gaussian probability density function. Since the Naive approach is very expensive due to costly distance function between uncertain objects, a populated-cell list (PC-list) based top-k distance-based outlier detection approach is proposed in this work. Where PC-list is a sorted list of non-empty cells of a grid (grid is used to index dataset objects). Using PC-list, the top-k outlier detection algorithm needs to consider only a fraction of dataset objects and hence quickly identifies candidate objects for top-k outliers. An extensive empirical study shows that our proposed approach is effective, efficient and scalable.
منابع مشابه
A Study on Distance-based Outlier Detection on Uncertain Data
Uncertain data management, querying and mining have become important because the majority of real world data is accompanied with uncertainty these days. Uncertainty in data is often caused by the deficiency in underlying data collecting equipments or sometimes manually introduced to preserve data privacy. The uncertainty information in the data is useful and can be used to improve the quality o...
متن کاملFast Top-k Distance-Based Outlier Detection on Uncertain Data
This paper studies the problem of top-k distance-based outlier detection on uncertain data. In this work, an uncertain object is modelled by a probability density function of a Gaussian distribution. We start with the Naive approach. We then introduce a populated-cell list (PC-list), a sorted list of non-empty cells of a grid (grid is used to index our data). Using PC-list, our top-k outlier de...
متن کاملDistance-Based Outlier Detection on Uncertain Data of Gaussian Distribution
Managing and mining uncertain data is becoming important with the increase in the use of devices responsible for generating uncertain data, for example sensors, RFIDs, etc. In this paper, we extend the notion of distance-based outliers for uncertain data. To the best of our knowledge, this is the first work on distance-based outlier detection on uncertain data of Gaussian distribution. Since th...
متن کاملA New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data
Detecting outliers which are grossly different from or inconsistent with the remaining dataset is a major challenge in real-world KDD applications. Existing outlier detection methods are ineffective on scattered real-world datasets due to implicit data patterns and parameter setting issues. We define a novel Local Distance-based Outlier Factor (LDOF) to measure the outlier-ness of objects in sc...
متن کاملOutlier Detection for Support Vector Machine using Minimum Covariance Determinant Estimator
The purpose of this paper is to identify the effective points on the performance of one of the important algorithm of data mining namely support vector machine. The final classification decision has been made based on the small portion of data called support vectors. So, existence of the atypical observations in the aforementioned points, will result in deviation from the correct decision. Thus...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015